Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sort integration benchmark #13306

Merged
merged 3 commits into from
Nov 15, 2024
Merged

Add sort integration benchmark #13306

merged 3 commits into from
Nov 15, 2024

Conversation

2010YOUY01
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

I noticed there is no benchmark to test sorting the whole relational table: existing sort benchmark is only for a single SortExec, and this can't test how would end-to-end large sort query scale to multiple CPU cores. With integration test, it is possible to see the combined performance of local sort on small batches and final step multi way sort-preserving merge.

The benchmark includes 10 queries to sort the entire lineitem table in TPCH dataset, with different characteristics. For example: with different number of sort key/ payload columns, different sort key types and cardinality, etc. Also it is easy to add more benchmark queries.
More details see sort_integration.rs

What changes are included in this PR?

Added a single benchmark binary for benchmark. It can be executed with:

# Under benchmarks/
./bench.sh run sort_integration
Q1 iteration 0 took 211.0 ms and returned 6001215 rows
Q1 iteration 1 took 186.7 ms and returned 6001215 rows
Q1 iteration 2 took 184.2 ms and returned 6001215 rows
Q1 iteration 3 took 185.4 ms and returned 6001215 rows
Q1 iteration 4 took 189.4 ms and returned 6001215 rows
Q1 avg time: 191.36 ms
Q2 iteration 0 took 156.9 ms and returned 6001215 rows
Q2 iteration 1 took 163.4 ms and returned 6001215 rows
Q2 iteration 2 took 166.2 ms and returned 6001215 rows
Q2 iteration 3 took 162.5 ms and returned 6001215 rows
Q2 iteration 4 took 169.5 ms and returned 6001215 rows
Q2 avg time: 163.70 ms
Q3 iteration 0 took 806.1 ms and returned 6001215 rows
Q3 iteration 1 took 812.8 ms and returned 6001215 rows
......

Query run results comparing sorting lineitem table with scaling factor 1 and 5

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ sort-lineitem-sf1 ┃ sort-lineitem-sf5 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │          186.49ms │         1244.18ms │  6.67x slower │
│ Q2           │          156.91ms │         1000.38ms │  6.38x slower │
│ Q3           │          804.57ms │         4682.39ms │  5.82x slower │
│ Q4           │          241.32ms │         1464.02ms │  6.07x slower │
│ Q5           │          407.56ms │         2037.05ms │  5.00x slower │
│ Q6           │          441.52ms │         2193.89ms │  4.97x slower │
│ Q7           │          786.11ms │         7000.62ms │  8.91x slower │
│ Q8           │          535.87ms │         2835.62ms │  5.29x slower │
│ Q9           │          532.31ms │         2957.57ms │  5.56x slower │
│ Q10          │          841.96ms │         9289.74ms │ 11.03x slower │
└──────────────┴───────────────────┴───────────────────┴───────────────┘

Are these changes tested?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @2010YOUY01 -- I am sorry for the delay in reviewing

I found this PR really well commented and documented. ❤️

What would you think about adding this benchmark to the dfbench benchmark binary rather than creating an entirely new one?

Perhaps it can be SortIntegration or SortTPCH for a name?

https://github.com/apache/datafusion/blob/77f330c6a2b26f2d1d4d4bf11d456fad466316b4/benchmarks/src/bin/dfbench.rs#L45-L44

Each different binary has substantial overhead when building (and it is a different target to build, etc) so reducing the total number of binaries I think helps the overall CI time

/// - Thin variant: `l_partkey` column with `BIGINT` type (1 column)
/// - Wide variant: all columns except for possible key columns (12 columns)
const SORT_QUERIES: [&'static str; 10] = [
// Q1: 1 sort key (type: INTEGER, cardinality: 7) + 1 payload column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for your annotations here. 👍

@2010YOUY01 2010YOUY01 marked this pull request as draft November 14, 2024 13:22
@2010YOUY01 2010YOUY01 marked this pull request as ready for review November 15, 2024 04:47
@2010YOUY01
Copy link
Contributor Author

@alamb Thank you, this makes sense to me. Updated.

@alamb alamb merged commit 5ea1d31 into apache:main Nov 15, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Nov 15, 2024

Thank you very much @2010YOUY01 -- very cool

@2010YOUY01 2010YOUY01 deleted the sort-bench branch November 16, 2024 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants